SEC RET 



DOfclD 




cjGimiG mu, esmmv msmv 

mmm g « mam f mQV&m® 



dGKB fE>SE> 



TEACHING COMPUTER SCIENCE TO LINGUISTS (U) 
THE STORY OF MOSES fU1 

I I - 



Sydney Fairbanks 






TRAFFIC ANALYSIS OF THE FUTURE (U) 

HOW ARE YOUR STAMINA? (U) 

NSA-CROSTIC NO. 26 (U) 

DATA STANDARDS WITHOUT TEETH (U) . 

...MR. P ATT IE REPLIES (U) ...Mark T. Pattie, Jr 

LETTERS TO THE EDITOR (U) 16 

THE BALTIC ENCODERS (U) Anthony Reiskis .18 

THE NAVAJO CODE TALKERS (U) . . 20 



.1 

.5 

.6 

10 

11 

12 

14 

15 



THIS DOCUMENT CONTAINS CODEWORD MATERIAL 



CLASSIF I ED DY N SA/CSS M ISO 
DECLAS SI FY ON -XJUtmflfla. 



6-36 



Declassified and Approved for Release by NSA on 10-12-2012 pursuant to E.O. 13526. 
vlDR Case # 54778 











DOCID : 4019666 



SECRE T 




Published Monthly by PI, Techniques and Standards, 
for the Personnel of Operations 



VOL. VI, No. 6 



JUNE 1979 



PUBLISHER 



WILLIAM LUTWINIAK 



BOARD OF EDITORS 



Editor-in-Chief 

Collection 

Cryptanalysis 

Cryptolinguistics 

Information Science 

Language 

Machine Support 

Mathematics 

Special Research 

Traffic Analysis 



..David H. Williams (39S7s) 

. | | (8555s) 

. | 1 (4902s) 

■ | (5981s) 

. | (3034s) 

• I | (8161s) 

(5084s) 

. 1 1 18518s) 

. .Vera R. Filby (7119s) 

..Don Taurone (3573s) 



P 



L. 86-36 



Production Manager 



Harry Goff (5236s) 



For individual subscriptions 
send 

name and organizational designator 
to: CRYPTOLOG, PI 



SECR ET 





UNCLASSIFIED 



DOCID: 4019666 



TEACHING COMPUTER SCIENCE 
TO LINGUISTS 



by 



P16 



ELCITRA ERITNE SINT 
. . . LIPMOC YB TES SAW 
TES . . . UPMOC YB TES 
MOCYBUPMOCYBUPMOCYB 



Consider the plight of the NSA linguist. He entered 
his chosen field to "ling", i.e. to work with foreign 
language material: to translate, to transcribe, or perhaps to 
analyze and report the significance of large amounts of 
text. He chose to deal with the fuzzy world of ambiguous 
meanings, of convoluted and unpredictable rules of 
grammar, and with the imprecision inherent in the 
transference of an idea from one language into another. 
He never cared much for the picayune rigor or the 
grubby technical details of engineering or the physical 
sciences - they just weren't appealing to him. He felt at 
home in that imprecise world of meaning that is so foreign 
to most Americans. And then he came to NSA. 

Here the linguist must deal on a daily basis with 
computers. (As if it weren't bad enough that the phone 
company and his insurance agent used the darn things!) In 
fact not only does he have to deal with computers, he has 
to actually rely on them! More often than not, they 



provide his daily material for translation and store the older 
material. God forbid that he should have to actually enter 
his translation or transcription into them, for he has seen 
the words: ENTIRE FILE DELETED on more than 
one occasion after spending an entire day laboriously 
entering his work keystroke by keystroke! 

But even worse than the computers themselves are their 
keepers: programmers - people who really have no 
comprehension of language work and who are always 
muttering about "saving bits" or something equally 
obscure, when all you really wanted to know was why you 



couldn't get the machine to print out your daily take 
separated in an ever so slightly different way. 

The worst experience of all awaited the rare brave 
linguist who got involved in the design of a new computer 
system for his office. The project development people 
seemed to be a special breed of programmers whose 
incomprehensibility was matched only by their desire to 
document in a level of detail that baffled the minds of 
ordinary folk. Even though it is considered almost 
axiomatic that projects which don't intimately involve the 
proposed end-users from the very beginning are doomed to 
failure, the linguist finds participation in planning 
extremely difficult because of the "computer-ese" language 
barrier and because of the lack of understanding of 
language work by others. Once someone even asked a 
linguist on such a planning team if he really needed all 32 
letters of the Cyrillic alphabet and couldn't he get along 
with just 25 or so because of computer limitations .... 1 



Because of this culture shock in going from the 
language world to the so-called electronic office and 
because of the tremendous improvements that are possible 
when linguists are included in the planning for the 
computer support for their work, the idea of an intro- 
ductory (and terminal) course for linguists in computer 
applications to language processing was born. Such a 
course was developed and subsequently tested on two 
groups of linguists. The results of those experiences offer 
many interesting revelations about the nature of the 
linguistic point of view vs that of computer science. 

Before some of these experiences can be detailed, a 
brief explanation of the newly developed course is needed. 




veu? r spent 
three pays 
putting a New 

PROGRAM ON 
TAPE ANP NOW 

jr cant eer the 
com Pure r to ■ 



H£Y, NO, 
DON'T 
PUSH 
THffT 




sttantinn: 
pi antira 
program lias 
haan daslrogad 




Atm& 




Credits 

a) Doonesbury cartoons reproduced with permission. Copyright 
1972, G.B. Trudeau/distributed by Universal Press Syndicate. 

b) This article was prepared using the B7700 CANDE word 

processing system and a text composition system being designed and 
implemented hv l i P-HL.and the final output of the 

article was done on the SEACO 1700 CRT phototypesetter in S3, 



1. For a similar account of a linguist’s first experience with computers, 
see Robert Wachal's article, "Humanities and Computers: A 
Personal View", North American Review, Spring 1971. 



June 79 * CRYPTOLOG * Page 1 



P.L. 86-36 



UNCLASSIFIED 




UNCLASSIFIED 



DOCID: 4019666 



The course, CL-200, "Linguistic Applications of Comput- 
ers", is both an introduction to computer science in general 
and to NSA-specific language projects in particular. The 
students experience the art of computer programming by 
learning enough about two quite different programming 
languages to write one small program in each language. In 
addition, they take tours of two computer operations areas 
in order to see some real machines in the flesh. The work 
in computational linguistics that is discussed includes the 
Agency’s efforts in computer lexicography, computer 
scripting, speech processing, and automatic degarbling, as 
well as talks on the academic fields of machine translation 
of natural languages and artificial intelligence. The last 
portion of the course explains the Agency's project 
management system and the ways in which the proposed 
end-user can influence a new project to insure its success. 
A detailed outline of the course can be found at the end of 
this article. 



names. 2 Variable names are unlike, for example, 
programming language reserved words like 'READ', 
DECLARE 1 , FORMAT', ' PROCEDURE' , etc , 
which have fixed and definite meanings to the computer; 
meanings which are reasonably suggested to English 
speakers by these particular words. 

Yet the distinction between these two types of words 
was difficult for almost everyone to grasp. This led one 
student to attempt to do a frequency count by just listing 
the words, NUM_OF_ONES\ 'NUM_OF_TWOS' , etc., 
since, he reasoned, that the computer understood the 
"English" words ’READ’ and ’END’, so therefore it 
ought to also be able to understand something like 
1 NUM_OF_EIGHTS'!! It wasn't until all variable names 
in the course lectures and examples were changed to words 
that clearly had nothing to do with the semantics of the 
particular example (e.g. variables that held the frequency 
counts for certain characters had names like ’LION' and 




Without a doubt the most difficult tasks for the 
students were the programming assignments. The reasons 
for this were not completely clear and often varied among 
the individual students, although two problems were shared 
by all. The first of these had to do with the choice of 
names for the variables in the programs. In high school 
algebra, for example, one usually prefaces a discussion of a 
problem with an explanation like: Let x be the number of 
apples that John bought. This is often not easily done in 
computer science, and even if it is, it is not sufficient for a 
large, complex computer program, as one quickly forgets 
what x was supposed to represent, or even what John was 
trying to do! Most professional programmers tend to 
choose names for the variables in a program that are at 
least somewhat suggestive of the meaning those variables 
have in that program. Hence a variable which denotes the 
position of a certain keyword within a section of running 
text might be named ' KEYWORD_OFFSET 1 or 
'SUBSTRING_POSITION' if one were programming in 
PL/I, a language which allows very long, descriptive names, 
(or 'STRPOS' if one were using a more restrictive, 
inflexible language like FORTRAN which limits the lengths 
of names to six letters). Yet these expressions have no 
meaning to the computer. It merely stores the names in a 
table for future reference and sets aside a certain amount of 
computer memory to hold the values associated with those 



'BEAR', as opposed to ' NUM_OF_A_S' , and individual 
lines in the program were given labels like 'COW', 

DOG 1 etc.) that the students really caught on. While 
such a practice is at best poor for a professional program- 
mer, it was almost mandatory for the linguist who would 
have read too much into the choice of a name otherwise. I 
really knew that this notion had been mastered when one 
of my students presented me with the frequency-count 
program abstracted here: 

KAZOE : PROCEDURE OPTIONS(MAIN); 

DECLARE (ICHI, NI, SAN) FIXED BIN(15,0); 

HAJIME: READ FILE(SYSIN) INTO (TEXT); 

IF SUBSTR(TEXT, 1,3) = ’END’ 

THEN GO TO OWARI; 

NODORI: IF SUBSTR( TEXT ,1,1) = '3' 

THEN SAN = SAN + 1; 

GO TO HAJIME; 

END KAZOE; 



2. The fact that the computer does not understand English can make it 
rather tolerant of the inadequacies of some programmers. In a 
reasonably large program I wrote a few years ago that dealt with 
dictionary retrievals, I was quite proud of the clear and descriptive 
names I had chosen and the fact that they made the program so 
much easier to understand. It wasn't until the program was 
completely finished that I found out that there was something not 
quite right about some of the names I had chosen, names like: 
NUMBER_OF RETREIVALS' and RETREIVAL.TIME !! 
Since my misspellings were at least consistent, they were perfectly 
■understandable" to the machine! 



June 79 * CRYPTOLOG * Page 2 



UNCLASSIFIED 




